XML provides an extraordinarily flexible set of structures that can hold many different types of information, from highly structured database tables and lists to much more free-form documents. The tightly defined requirements for XML documents ensure that all applications play by the same rules when they read (technically, parse) and write documents. The XML 1.0 specification leaves very little room for conflicting syntactical interpretations of the same XML document, allowing the exchange of documents across a wide variety of platforms, applications, and development environments. More recent schema developments provide even more powerful tools for describing information and simplifying information exchange.
Reading documents the same way every time, even in different environments, opens incredible new possibilities for networked information sharing, but it only goes part of the way. Using XML's tight syntax effectively requires common vocabularies that allow programs and people to understand the meaning of documents, not simply parse them into elements and attributes. A formal description of the vocabulary and structures used in a document allows all parties to the transaction to share a common understanding of the document contents. XML includes tools that allow developers and authors to create common structures called schemas that can help provide meaning as well as syntax. Schemas help build networks of understanding, acting as contracts between all parties. Understanding how schemas model data will help you understand your documents better, producing cleaner and more reliable structures for communication.
A schema describes the vocabulary and structures that may appear in a document conforming to that schema. Schemas use their own formal grammars to express document structures and vocabulary. If a set of documents uses the same schema, the documents may have markedly different contents, but can share common processing. A schema for invoices, for example, would describe a class of documents that have very different contents (sender, recipient, rates and prices, services and goods, and, of course, total) but which have the same basic structure and can be processed by generic tools for handling invoices. Applications check documents against the schema, and process them only if the document passes inspection (more commonly called validation). This way, applications don't need to provide extensive error-handling or implement complex logic for determining the structure of a variety of different invoice formats. The schema allows applications to coordinate their activities safely and relatively easily.
Schemas provide constraints that documents must meet to be considered 'valid' and therefore safely processable. Those constraints can be used in a number of different applications, because schemas provide a formal vocabulary that can be processed and repurposed. Editing software like XML Instance can read a schema and use it to provide support to document authors, presenting them with acceptable choices along the way or perhaps even building their entire interface (like an entry form) around the contents of the schema. Applications that exchange documents can use the schema to double-check each other's work and make certain that all applications participating in the exchange are playing by the same set of rules. When errors appear, the receiving application can report them back to the sender, and hopefully have them corrected. Schemas provide an extra level of safety net above the core XML document structure, making it much simpler to exchange information reliably.
Another key task schemas help manage is the integration of documents and data. XML provides a framework in which both document and data structures can co-exist. Data may dominate a document (as when a document represents a relational database table) or appear as fragments scattered among document structures. When data and documents from multiple sources must merge into a single document, schemas can smooth the process by making sure the inputs are what they claim to be and that the output is delivered as it should be. XML Instance is the ideal tool for editing data-oriented files.
In many cases, describing document structure - the vocabulary that identifies different parts of a document, and how those parts fit together - is all that's needed. XML provides a fairly complete set of tools for describing the parts of a document (elements), annotating those parts (attributes), and constraining the parts that can appear within the elements and attributes (content models and attribute types).
Schemas describe which features are allowed to appear in particular document types and which are not. This framework can then provide a solid foundation for presentation, storage, and interchange, assisting with document creation and editing along the way. Schemas give document authors and programmers a common set of expectations, making it possible to create smarter applications that support document authors more completely. Once a document has been created, the schema can provide a roadmap to the document that other applications - like search engines, document management systems, or presentation tools - can use to help them find and manage the information inside the document.
Many applications need to know much more about the information in a document than what goes where and what it's called. Data-oriented applications also want to know whether the content inside of an element is an integer, a string, a database key value, a currency value, a date, a boolean (true or false) value, or any of a number of other possibilities. Data-oriented schemas provide an extra layer of information that allows an application to pass more of the job of identifying and verifying data types to a validation component. By adding data typing to document structures, schemas become useful for a much broader set of applications. Document structures remain important, but now it becomes possible to define documents like invoices, with their dates, quantities, and currencies, both abstractly and completely. Purely data-focused applications, like database interchange, gain an extra level of processing security that is far more meaningful in this context.
XML is an opportunity to start treating documents and data as partners rather than as separate worlds. XML schemas are a key tool for realizing this promise, creating structures that clearly identify document structure and data types within that structure. XML schemas can provide a map to your documents, letting you mix documents and data freely without ever locking the information into a document format that can't be used later for data extraction and processing. Data can be data, documents can be documents, and the two can be integrated without losing information. While XML has great promise in both document and data processing, its unique ability to bring the two together makes it far more powerful.
Copyright 2000 Extensibility, Inc.
Suite 250, 200 Franklin Street, Chapel Hill, North Carolina 27516